# ARM 프로세서 개요

Hancheol Cho

### **ARM (Advanced RISC Machine)**



Bits 32-bit, 64-bit

Introduced 1985; 32 years ago

Design RISC

Type Register-Register

Branching Condition code, compare and

branch

Open Proprietary

ARM, originally Acorn RISC Machine, later Advanced RISC Machine, is a family of <u>reduced instruction set computing</u> (RISC) <u>architectures</u> for <u>computer processors</u>, configured for various environments. **British** <u>company ARM Holdings</u> develops the architecture and licenses it to <u>other companies</u>, who design their own products that implement one of those architectures

### **ARM Processor Family**



출처: arm.com

### ARM Architecture 분류

- A The *Application* profile defines a VMSA based microprocessor architecture. It is targeted at high performance processors, capable of running full feature operating systems. It supports the ARM and Thumb instruction sets.
- R The *Real-time* profile defines a PMSA based microprocessor architecture. It is targeted at systems that require deterministic timing and low interrupt latency. It supports the ARM and Thumb instruction sets.
- The *Microcontroller* profile provides low-latency interrupt processing accessible directly from high-level programming languages. It has a different exception handling model to the other profiles, implements a variant of the PMSA, and supports a variant of the Thumb instruction set only.

| Profile   | Architecture | Instruction Set | Processor        |
|-----------|--------------|-----------------|------------------|
| A-Profile | ARMv7-A      | A32, T32        | Cortex-A Series  |
| R-Profile | ARMv7-R      | A32, T32        | Cortex-R Series  |
| M Duefile | ARMv7-M      | T32             | Cortex-M Series  |
| M-Profile | ARMV6-M      | T32             | Cortex-M0 Series |

### **ARM Processor**

|                          | Armv8                                          | Armv7                                            |          | Armv6                                          | Previous                             |
|--------------------------|------------------------------------------------|--------------------------------------------------|----------|------------------------------------------------|--------------------------------------|
|                          | Armv8-A                                        | Armv7-A                                          |          | Armv6                                          | Armv5                                |
| High performance         | Cortex-A73 Cortex-A75<br>Cortex-A57 Cortex-A72 | Cortex-A17<br>Cortex-A15                         | ∢        |                                                |                                      |
| High<br>efficiency       | Cortex-A53 Cortex-A55                          | Cortex-A9<br>Cortex-A8                           | Cortex-A | Arm11MPCore<br>Arm1176JZ(F)-S<br>Arm1136J(F)-S |                                      |
| Ultra high<br>efficiency | Cortex-A35<br>Cortex-A32                       | Cortex-A7<br>Cortex-A5                           | ٥        |                                                | Arm968E-S<br>Arm946E-S<br>Arm926EJ-S |
|                          | Armv8-R                                        | Armv7-R                                          | ~        |                                                |                                      |
| Real time                | Cortex-R52                                     | Cortex-R8<br>Cortex-R7<br>Cortex-R5<br>Cortex-R4 | Cortex-R | Arm1156T2(F)-S                                 |                                      |
| High                     | Armv8-M                                        | Armv7-M                                          | 1        | Armv6-M                                        | Armv4                                |
| performance              |                                                | Cortex-M7                                        |          |                                                | Σ                                    |
| Performance efficiency   | Cortex-M33                                     | Cortex-M4<br>Cortex-M3                           |          |                                                | Arm7TDMI<br>Arm920T                  |
| Lowest power and area    | Cortex-M23                                     |                                                  |          | Cortex-M0+<br>Cortex-M0                        | و                                    |

출처 : arm.com

## Cortex-R5 적용 예 - Xilinx Zynq



#### **Cortex-M Processor**



|                 | Cortex-M0 | Cortex-M0+ | Cortex-M3 | Cortex-M4     | Cortex-M7         |
|-----------------|-----------|------------|-----------|---------------|-------------------|
| Instruction set | ARMv6-M   | ARMv6-M    | ARMv7-M   | ARMv7-M       | ARMv7-M           |
| architecture    | Thumb,    | Thumb,     | Thumb,    | Thumb, Thumb- | Thumb, Thumb-2,   |
|                 | Thumb-2   | Thumb-2    | Thumb-2   | 2,            | DSP, FP (1. SP or |
|                 |           |            |           | DSP, FP (SP)  | 2. SP+DP)         |

출처: arm.com

### **Cortex-M Performance**



출처: arm.com

### **Cortex-M Instruction Set support**



출처 : arm.com

#### Cortex-M7



| Architecture                    | Harvard                                                                                                                                                   |  |
|---------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| ISA Support                     | Armv7-M                                                                                                                                                   |  |
| Pipeline                        | 6-stage superscalar + branch prediction                                                                                                                   |  |
| DSP Extensions                  | Single cycle 16/32-bit MAC<br>Single cycle dual 16-bit MAC<br>8/16-bit SIMD arithmetic<br>Hardware Divide (2-12 Cycles)                                   |  |
| Floating-Point Unit             | Optional single and double precision floating point unit IEEE 754 compliant                                                                               |  |
| Interconnect                    | 64-bit AMBA4 AXI, AHB peripheral port                                                                                                                     |  |
| Instruction cache               | 0 to 64 kB, 2-way associative with optional ECC                                                                                                           |  |
| Data cache                      | 0 to 64 kB, 4-way associative with optional ECC                                                                                                           |  |
| Instruction TCM                 | 0 to 16 MB with optional ECC                                                                                                                              |  |
| Data TCM                        | 0 to 16 MB with optional ECC                                                                                                                              |  |
| Memory Protection               | Optional 8 or 16 region MPU with sub regions and background region                                                                                        |  |
| Interrupts                      | Non-maskable Interrupt (NMI) + $1$ to $240$ physical interrupts                                                                                           |  |
| Interrupt Priority<br>Levels    | 8 to 256 priority levels                                                                                                                                  |  |
| Wake-up Interrupt<br>Controller | Up to 240 Wake-up Interrupts                                                                                                                              |  |
| Sleep Modes                     | Integrated WFI and WFE Instructions and Sleep On Exit capability.<br>Sleep & Deep Sleep Signals.<br>Optional Retention Mode with Arm Power Management Kit |  |
| Debug                           | Integrated Instructions                                                                                                                                   |  |
| Debug                           | $Optional\ JTAG\ and\ \underline{Serial\ Wire\ Debug}\ ports.\ Up\ to\ 8\ Breakpoints\ and\ 4\ Watchpoints.$                                              |  |
| Trace                           | Optional Instruction <u>Trace (ETM)</u> , Micro Trace Buffer (MTB), Data Trace (DWT), and Instrumentation Trace (ITM)                                     |  |

### 7.1 Why Cortex-M processors are easy to use

Although the Cortex-M processors are packed with features, they are also very easy to use. For instance, almost everything can be programmed in high-level language like C. Although there is a big variety of different Cortex-M processor-based products (e.g. with different memory size, peripherals, performance, packages, etc), the consistency of the architecture make it easy to start using a new Cortex-M processor once you have experience with one of them.

To make software development easier, and to enable better software reusability and portability, ARM developed the CMSIS-CORE, where CMSIS stands for Cortex Microcontroller Software Interface Standard. The CMSIS-CORE provides a standardized Hardware Abstraction Layer (HAL) for various features in the processors such as interrupt management control, using a set of APIs. The CMSIS-CORE is integrated in the device driver libraries from various microcontroller vendors, and is supported by various compilation suites.

Beside from CMSIS-CORE, CMSIS also have a DSP software library (CMSIS-DSP). It provides various DSP functions and is optimized for Cortex-M4 and Cortex-M7 processors, and also supports other Cortex-M processors. Both CMSIS-CORE and CMSIS-DSP are free and can be downloaded from the GitHub (CMSIS 4, CMSIS 5), and are supported by multiple tool vendors.

### Programmer's model



### **Real Time OS?**

#### **Context**

- 스레드/태스크가 실행되기 위한 최소한의 데이터
  - CPU 레지스터, 스택포인터 등등...



#### ARM State Program Status Registers



## **Context Switching**

#### Thread-control blocks prgm ctr t0 prgm ctr t1 registers registers stack ptr stack ptr etc. etc. Save Restore Time (Virtual processor) Thread t0 Thread t1 Context switch



### 스케줄러



Figure 4.1: Preemptive scheduling of tasks

## 스케줄러



#### **RTOS**







### **Exception Vector**

0

- 스택포인터 초기값을 지정 가능
   스타트업 코드도 C언어로 작성 가능
  - Exception ARMv6-M ARMv7-M Type 255 47 Device Specific Interrupts Device Specific Interrupts 17 16 15 SysTick SysTick 14 PendSV PendSV 13 Not used Not used 12 **Debug Monitor** 11 SVC SVC 10 9 Not used 8 Not used Usage Fault 6 5 **Bus Fault** MemManage (fault) 4 3 HardFault HardFault 2 NMI NMI

| Vector Table                     | Vector address<br>(initial) |
|----------------------------------|-----------------------------|
| Interrupt#239 vector 1           | 0x000003FC                  |
| Interrupt#31 vector 1            | 0x000000BC                  |
| Interrupt#1 vector 1             | 0x00000044                  |
| Interrupt#0 vector 1             | 0x00000040                  |
| SysTick vector 1                 | 0x0000003C                  |
| PendSV vector 1                  | 0x00000038                  |
| Not used                         | 0x00000034                  |
| Debug Monitor vector 1           | 0x00000030                  |
| SVC vector 1                     | 0x0000002C                  |
| Not used                         | 0x00000028                  |
| Not used                         | 0x00000024                  |
| Not used                         | 0x00000020                  |
| SecureFault (ARMv8-M Mainline) 1 | 0x0000001C                  |
| Usage Fault vector 1             | 0x00000018                  |
| Bus Fault vector 1               | 0x00000014                  |
| MemManage vector 1               | 0x00000010                  |
| HardFault vector 1               | 0x0000000C                  |
| NMI vector 1                     | 0x00000008                  |
| Reset vector 1                   | 0x00000004                  |
| MSP initial value                | 0x00000000                  |

## 성능 비교

#### • 데이터 처리 속도

| - 1199     | Dhrystone<br>DMIPS/MHz (v2.1)<br>– official | Dhrystone<br>DMIPS/MHz<br>(v2.1) – full<br>optimization | Coremark/MHz (v1.0) |
|------------|---------------------------------------------|---------------------------------------------------------|---------------------|
| Cortex-M0  | 0.84                                        | 1.21                                                    | 2.33                |
| Cortex-M0+ | 0.94                                        | 1.31                                                    | 2.42                |
| Cortex-M3  | 1.25                                        | 1.89                                                    | 3.32                |
| Cortex-M4  | 1.25                                        | 1.95                                                    | 3.40                |
| Cortex-M7  | 2.14                                        | 2.55                                                    | 5.01                |
| Cortex-M23 | 0.98                                        | (5)                                                     | 2.5                 |
| Cortex-M33 | 1.5                                         | -                                                       | 3.86                |

#### Interrupt Latency

|            | Interrupt latency (number of clock cycles) |
|------------|--------------------------------------------|
| Cortex-M0  | 16                                         |
| Cortex-M0+ | 15                                         |
| Cortex-M23 | 15                                         |
| Cortex-M3  | 12                                         |
| Cortex-M4  | 12                                         |
| Cortex-M7  | Typically 12, worst case 14                |
| Cortex-M33 | 12                                         |

출처 : arm.com

### **Interrupt Latency**

Interrupt Latency?



• 기존 마이크로 컨트롤러와 비교





### ST사의 MCU 구성



### STM32CubeMX, STM32Cube



#### STM32Cube



#### DSP?

#### DSP 장점

- 연산모듈이 8개 (최대 1Clock에 8개의 명령어 실행 가능)
- Very-Long-Instruction-Word (VLIW)
- 데이터 버스 최대 256bit



○ 소프트 파이프라인을 통한 병렬 실행

```
MVKL L1DCC, A0
                   ; \
|| MVKL L1PCC, BO
                   ; | Generate L1DCC pointer in A0
 MVKH L1DCC, A0
                  ; | and L1PCC pointer in B0
|| MVKH L1PCC, B0
                  ; \ OPER encoding for 'freeze'
|| MVK 1b, A1
|| MVK 1b, B1
                 ; / in both Al and Bl.
  STW A1, *A0
                 ; Write to L1DCC.OPER
II STW B1, *B0
                 ; Write to L1PCC.OPER
 LDW *A0, A1
                 ; Get old freeze state into Al from L1DCC
|| LDW *B0, B1
                   ; Get old freeze state into B1 from L1PCC
  NOP 4
 ; At this point, L1D and L1P are frozen.
 ; The old value of L1DCC.OPER is in bit 16 of Al.
 ; The old value of L1PCC.OPER is in bit 16 of B1.
```

#### DSP?

- DSP 장점
  - o DMA 기능이 강력함
    - DMA 기능만으로도 일부 이미지 처리가 가능함

- DSP 단점
  - 인터럽트 발생시 명령어가 길어서 연산속도에 영향을 많음
    - 최적화시 기본적으로 인터럽트가 Disable됨으로 인터럽트 사용시에는 최적화 옵션 사용시 주의가 필요함
    - ARM 프로세서와 듀얼로 많이 사용
  - 최적화에 따른 속도 편차가 심함
    - 연산모듈은 8개이나 명령어 종류에 따른 동시 실행이 안되는 경우가 있음
    - 컴파일러 옵션만으로는 최적화의 한계가 있음으로 TI에서 제공하는 최적화 라이브러리 사용 권장
  - 캐시에 대한 영향이 크다
    - 명령어도 길고 데이터도 크기때문에 캐시 메모리에서 실행시와 외부메모리에서 실행시 속도 편차가 큼

- STM32F746ZGT6 216Mhz, Cortex-M7, 1MB Flash, 320KB SRAM
- 아두이노 우노 핀 헤더
- 아두이노 IDE 개발환경 지원
- 다이나믹셀/올로/UART/CAN 인터페이스
- 배터리 입력 및 전원 출력(12V/5V/3.3V)



• Turtlebot3 Burger/Waffle의 제어기로 사용됨



활용 예제
 <a href="https://youtu.be/-kBflS6wls">https://youtu.be/-kBflS6wls</a>



- 하드웨어 자료
  - https://github.com/ROBOTIS-GIT/OpenCR-Hardware
- 펌웨어 자료
  - https://github.com/ROBOTIS-GIT/OpenCR